home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Carousel Volume 2 #1
/
carousel.iso
/
mactosh
/
appl
/
browser.sit
/
quick tips - READ ME (v.244+)
< prev
Wrap
Text File
|
1987-05-14
|
9KB
|
159 lines
Welcome to Browser v.244! This program enables you to make and browse
indices to very large collections of free-text data. Read the articles
accompanying the program for comments on design philosophy, algorithms,
data structures, etc.
-----------------------------------------------------------------------------
This is the help file for Browser v.244. If you have used Browser v.223 or
earlier versions, please read the warning at the end of this help file!
-----------------------------------------------------------------------------
In general:
* if you have problems, try running without a RAM Cache (turn off from
the control panel, and reboot) and make sure you have enough memory
allocated for the program (probably at least 500 KB).
* use DivJoin, Microsoft Word, or other such programs to join multiple small
text files together to make suitably-massive files for indexing.
* when all else fails, call me up and I'll try to help....
-----------------------------------------------------------------------------
Under the "Browse" menu:
* select "Open..." to open a previously-created index file for browsing
- click on a line in the Index window to call up all occurrences of
that term in the Context window
- click on a line in the Context window to call up the full text of
the database in the vicinity of that line
- select a longer text length from the menu if speed in opening the
text window is less essential, or if more text around a target
is desired
- type into the Index window to jump to a given target location (takes
a few seconds, longer from floppy)
- use the Subindex commands to work within a subset of the entire file
(very useful for large databases)
- "Empty" empties out the working subindex, "Fill" fills it up again (the
default startup condition), and "Invert" does a boolean NOT operation
(so anything that was in the subindex leaves it, and everything that
was not in the subindex is not included in it)
- hold down the shift key (cursor turns into a "+") and click on items to
add their neighborhoods to the working subindex (boolean OR operation);
hold down the option key (cursor turns into a "-") and click on items
to remove their neighborhoods from the working subindex (boolean NAND)
- the Subindex Proximity choices allow you to control the neighborhood or
region of influence around shift-click and option-click selections for
the working subindex. "Words" (the default) selects neighborhoods of
half a dozen words or so around each click (actually, it selects all
terms within 32 bytes of the chosen item, and some terms out as far
as 64 bytes ... read the tech notes or subindex source code for the
gory details). "Sentences" selects neighborhoods within a few sentences
of the selected items, and "Paragraphs" selects neighborhoods of
within a few paragraphs.
EXAMPLE:
You have indexed up the past year of on-line sessions you've had, and want
to recall items concerned with the format of MacWrite documents. Choose
"Open..." from under the Browser menu and open the already-indexed file.
Scroll to "MACWRITE" in the index ... it occurs 2,345 times, far too many to
effectively browse through. So, choose "Empty" to clear out the working
subindex and then shift-click on MACWRITE. Now all 2,345 neighborhoods of
the occurrences of MACWRITE are marked as valid. Scroll the index window
to FORMAT and see that only 2 out of the 987 occurrences of FORMAT occurred
within a few words of MACWRITE. Click on FORMAT and see those two occurrences
in context. If they don't answer your questions, go back to MACWRITE, change
the Proximity neighborhood from "Words" to "Sentences" (or even "Paragraphs")
and shift-click again, to broaden out the selected subindex. Now check back
under FORMAT and see that 31 of the 987 occurrences are marked valid; browse
through them, and find the desired items.
-----------------------------------------------------------------------------
Under the "Index" menu:
select "New..." to start creating an inverted index to a text file
- index creation goes on in background while you can browse another file
(unless you select Fast Index option, in which case everything else
mostly locks up but indexing goes about 3 times faster)
- don't quit the main program while indexing is still occurring, or you'll
have to throw away the partially-sorted "....Index" file and the
"Temporary Radix Sort File" before running again
- don't attempt to index a file that already has an index, and don't index
a file which has a name longer than 25 letters
- be sure that you have at least 6 times the space free on your disk as
the length of the original text file to be indexed (the index file
occupies up to 3 times the space of the original text, and a temporary
sort file of that same size is also needed during indexing)
- use Omit Words options to leave out undesired terms from your indices
and thereby make them smaller (and get around the 6x space requirement
above) ... index sorting also goes faster when words are omitted
- the status window displays progress of an index operation: gray bar shows
amount of index-building scan completed, and black bar then shows
proportion of index-sorting that has been finished; the window updates
about once/second (when you're not in Fast Index mode)
- you can turn options like Fast Index on/off during indexing (though it
doesn't make much sense to change the Omit Words choices during the
course of index generation); hold down the mouse button and wait
for the disk to spin in order to get a chance to do something while
Fast Index is on
-----------------------------------------------------------------------------
Under the "File" and "Edit" menus:
* the editor is a modified version of the Sibley Editor supplied with MacForth,
and copyright restrictions prevent me from sending out the source code
unless you are a MacForth purchaser (and if you have MacForth, you can
run the Browser from within MacForth with the full-up Forth interpreting
version of the Sibley Editor -- a powerful combination! One caveat: the
default Sibley Editor uses numerous OUTFILE commands and thus prevents
Browser windows from always updating properly ... if a window isn't
refreshed when you uncover it, click in it and scroll a bit and that
should cure the problem).
- editor commands are pretty standard; things get slow if the files being
edited are very big (over 50 KB or so), so stick to shorter files
- you have up to four windows available into four different text files
- don't launch another application while building an index, as mentioned
in connection with "Quit" command earlier
- "Margin" under the Edit menu re-word-wraps the current selection to fit the
current screen margins
- The editor is memory-based, so don't try to edit too big a file
- The editor is the newest thing in the package, so please watch out
carefully for bugs in it (and in its interactions with the rest of the
program ... for instance, you may have to click on another Browser
window before clicking on an editor window in order to get the "Edit"
menu activated, if a Desk Accessory was the front window ... I'll try
to fix that someday, if I can make it happen consistently).
-----------------------------------------------------------------------------
Send suggestions for improvements, and details of bugs, to:
Mark Zimmermann
9511 Gwyndale Drive
Silver Spring, MD 20910
phone (301)565-2166 (home) or (703)482-9572 (ofc, rarely in)
arpanet: science@nems.arpa
CompuServe: 75066,2044
-----------------------------------------------------------------------------
For users of earlier Browser releases:
WARNING -- in versions of Browser before 0.224 there is a problem that may
cause some words to be omitted and others to be duplicated in big indices! It
is associated with the behavior of MacForth's WRITE.VIRTUAL command for file
I/O, and ONLY occurs with an index file when an alphanumeric character appears
more than 255 times in column 12 of the indexed word list. (For normal English
text this won't happen until the file being indexed is over 1.5 MB long, and
only 256 index terms will be messed up out of 80,000+.)
THUS -- if you are using a version prior to v.224, please send me a disk
and a self-addressed stamped envelope for a more recent release, if you need
to work with big files. If you have used an early version to make some big
index files, please throw those indices away and reindex.